37 research outputs found

    CRoute: a fast high-quality timing-driven connection-based FPGA router

    Get PDF
    FPGA routing is an important part of physical design as the programmable interconnection network requires the majority of the total silicon area and the connections largely contribute to delay and power. It should also occur with minimum runtime to enable efficient design exploration. In this work we elaborate on the concept of the connection-based routing principle. The algorithm is improved and a timing-driven version is introduced. The router, called CROUTE, is implemented in an easy to adapt FPGA CAD framework written in Java, which is publicly available on GitHub. Quality and runtime are compared to the state-of-the-art router in VPR 7.0.7. Benchmarking is done with the TITAN23 design suite, which consists of large heterogeneous designs targeted to a detailed representation of the Stratix IV FPGA. CROUTE gains in both the total wirelength and maximum clock frequency while reducing the routing runtime. The total wire-length reduces by 11% and the maximum clock frequency increases by 6%. These high-quality results are obtained in 3.4x less routing runtime

    New FPGA design tools and architectures

    Get PDF

    A novel tool flow for increased routing configuration similarity in multi-mode circuits

    Get PDF
    A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration (RTR) of an FPGA, all the modes can be time-multiplexed on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional run-time reconfiguration techniques generate a configuration of the reconfigurable region for every mode separately. This results in configurations that are bit-wise very different. Thus, in this case, many bits need to be changed in the configuration memory to switch between modes, leading to long reconfiguration times. In this paper we present a novel tool flow that retains the placement of the conventional RTR flow, but uses TRoute, a reconfiguration-aware connection router, to implement the connections of all modes simultaneously. TRoute stimulates the sharing of routing resources between connections of different modes. This results in a significant increase in the similarity between the routing configurations of the modes. In the experimental results it is shown that the number of routing configuration bits that needs to be rewritten is reduced with a factor between 2 and 4 compared to conventional techniques

    A fully parameterized virtual coarse grained reconfigurable array for high performance computing applications

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have proven their potential in accelerating High Performance Computing (HPC) Applications. Conventionally such accelerators predominantly use, FPGAs that contain fine-grained elements such as LookUp Tables (LUTs), Switch Blocks (SB) and Connection Blocks (CB) as basic programmable logic blocks. However, the conventional implementation suffers from high reconfiguration and development costs. In order to solve this problem, programmable logic components are defined at a virtual higher abstraction level. These components are called Processing Elements (PEs) and the group of PEs along with the inter-connection network form an architecture called a Virtual Coarse-Grained Reconfigurable Array (VCGRA). The abstraction helps to reconfigure the PEs faster at the intermediate level than at the lower-level of an FPGA. Conventional VCGRA implementations (built on top of the lower levels of the FPGA) use functional resources such as LUTs to establish required connections (intra-connect) within a PE. In this paper, we propose to use the parameterized reconfiguration technique to implement the intra-connections of each PE with the aim to reduce the FPGA resource utilization (LUTs). The technique is used to parameterize the intra-connections with parameters that only change their value infrequently (whenever a new VCGRA function has to be reconfigured) and that are implemented as constants. Since the design is optimized for these constants at every moment in time, this reduces the resource utilization. Further, interconnections (network between the multiple PEs) of the VCGRA grid can also be parameterized so that both the inter- and intraconnect network of the VCGRA grid can be mapped onto the physical switch blocks of the FPGA. For every change in parameter values a specialized bitstream is generated on the fly and the FPGA is reconfigured using the parameterized run-time reconfiguration technique. Our results show a drastic reduction in FPGA LUT resource utilization in the PE by at least 30% and in the intra-network of the PE by 31% when implementing an HPC application

    Maximizing the reuse of routing resources in a reconfiguration-aware connection router

    Get PDF
    Parameterised configurations for FPGAs are configuration bitstreams of which some of the bits are defined as Boolean functions of parameters. By evaluating these Boolean functions using different parameter values, it is possible to quickly and efficiently derive specialised configuration bitstreams with different properties. Generating and using parameterized configurations requires a new tool flow. In this paper we propose a novel algorithm for the routing step of this tool flow. This new router, called the connection bundle router, is able to route a circuit with parameterized interconnections. It produces routing solutions in less time (up to a factor 5,2) and with a better quality in terms of number of wires (up to 38%) and minimum track width (up to 25%) than its predecessors. The connection bundle router is fully automated and uses a scalable connection-based representation for the parameterized interconnections in a tunable circuit

    A connection-based router for FPGAs

    Get PDF
    The FPGA's interconnection network not only requires the larger portion of the total silicon area in comparison to the logic available on the FPGA, it also contributes to the majority of the delay and power consumption. Therefore it is essential that routing algorithms are as efficient as possible. In this work the connection router is introduced. It is capable of partially ripping up and rerouting the routing trees of nets. To achieve this, the main congestion loop rips up and reroutes connections instead of nets, which allows the connection router to converge much faster to a solution. The connection router is compared with the VPR directed search router on the basis of VTR benchmarks on a modern commercial FPGA architecture. It is able to find routing solutions 4.4% faster for a relaxed routing problem and 84.3% faster for hard instances of the routing problem. And given the same amount of time as the VPR directed search, the connection router is able to find routing solutions with 5.8% less tracks per channel

    Analyzing the divide between FPGA academic and commercial results

    Get PDF
    The pinnacle of success for academic work is often achieved by having impact on commercial products. In order to have a successful transfer bridge, academic evaluation flows need to provide representative results of similar quality to commercial flows. A majority of publications in FPGA research use the same set of known academic CAD tools and benchmarks to evaluate new architecture and tool ideas. However, it is not clear whether the claims in academic publications based on these tools and benchmarks translate to real benefits in commercial products. In this work we compare the latest Xilinx commercial tools and products with these well-known academic tools to identify the gap in the major figures of merit. Our results show that there is a significant 2.2X gap in speed-performance for similar process technology. We have also identified the area-efficiency and runtime divide between commercial and academic tools to be 5% and 2.2X, respectively. We show that it is possible to improve portions of the academic flow such as ABC logic optimization to match the quality of commercial tools at the expense of additional runtime. Our results also show that depth reduction, which is often used as the main figure of merit for logic optimization papers does not translate to post-routing timing improvements. We finally discuss the differences between academic and commercial benchmark designs. We explain the main differences and trends that may influence the topic choice and conclusions of academic research. This work emphasizes how difficult it is to identify the relevant FPGA academic work that can provide meaningful benefits for commercial products

    Estimating circuit delays in FPGAs after technology mapping

    Get PDF
    An FPGA implementation requires a significant effort of the hardware designer, who optimizes FPGA designs by going through many time-consuming CAD flow iterations. These iterations provide two types of feedback: (1) the FPGA performance and (2) the identification of the parts having the highest impact on the FPGA performance. Both depend on the wirelength behavior. Studies have been dedicated to the estimation of local [5] and global [4] wirelengths, but to our knowledge both performance estimations and identification of the critical zone are not present in literature. Therefore this paper, firstly, presents a comparison of three performance estimation techniques: logic depth, Monte Carlo simulation and fast placement (ordered from low to high accuracy and runtime). Secondly, four methods identifying the critical zone are compared. Results show that Monte Carlo simulations provide a good identification of the parts having the highest impact on the performance. We conclude that Monte Carlo simulations provide useful feedback within a short runtime (about 30 times faster than placement), reducing the time-to-market of FPGA implementations

    Identification of dynamic circuit specialization opportunities in RTL code

    Get PDF
    Dynamic Circuit Specialization (DCS) optimizes a Field-Programmable Gate Array (FPGA) design by assuming a set of its input signals are constant for a reasonable amount of time, leading to a smaller and faster FPGA circuit. When the signals actually change, a new circuit is loaded into the FPGA through runtime reconfiguration. The signals the design is specialized for are called parameters. For certain designs, parameters can be selected so the DCS implementation is both smaller and faster than the original implementation. However, DCS also introduces an overhead that is difficult for the designer to take into account, making it hard to determine whether a design is improved by DCS or not. This article presents extensive results on a profiling methodology that analyses Register-Transfer Level (RTL) implementations of applications to check if DCS would be beneficial. It proposes to use the functional density as a measure for the area efficiency of an implementation, as this measure contains both the overhead and the gains of a DCS implementation. The first step of the methodology is to analyse the dynamic behaviour of signals in the design, to find good parameter candidates. The overhead of DCS is highly dependent on this dynamic behaviour. A second stage calculates the functional density for each candidate and compares it to the functional density of the original design. The profiling methodology resulted in three implementations of a profiling tool, the DCS-RTL profiler. The execution time, accuracy, and the quality of each implementation is assessed based on data from 10 RTL designs. All designs, except for the two 16-bit adaptable Finite Impulse Response (FIR) filters, are analysed in 1 hour or less
    corecore